Back

Nature Medicine

Springer Science and Business Media LLC

Preprints posted in the last 90 days, ranked by how well they match Nature Medicine's content profile, based on 117 papers previously published here. The average preprint has a 0.16% match score for this journal, so anything above that is already an above-average fit.

1
Human vs AI Clinical Assessment: Benchmarking a Multimodal Foundation Model Against Multi-Center Expert Judgment on the Mental Status Examination.

Mwangi, B.; Jabbar Abdl Sattar Hamoudi, H.; Sanches, M.; Dogan, N.; Chaudhary, P.; Wu, M.-J.; Zunta-Soares, G. B.; Soares, J. C.; Martin, A.; Soutullo, C. A.

2026-04-20 psychiatry and clinical psychology 10.64898/2026.04.17.26351105 medRxiv
Top 0.1%
52.1%
Show abstract

The Mental Status Examination (MSE) is the cornerstone of the psychiatric evaluation, yet validating artificial intelligence (AI) against the inherent variance of clinical judgment remains a critical bottleneck. Here we introduce a multi-center framework to benchmark the open-weight multimodal foundation model Qwen3-Omni against independent expert panels at two sites, UTHealth and Yale. Evaluating 396 classifications across 10 MSE domains and three longitudinal timepoints of increasing symptom severity, we found that experts achieved substantial agreement (Gwets AC1 = 0.87), whereas the model achieved only moderate alignment (AC1 = 0.70-0.72). Even as the models overall pathology prediction rate approximated the experts, the aggregate equilibrium masked a profound "clinical reasoning gap". Specifically, the model systematically over-predicted observable signs (e.g., speech, affect) while notably failing in inferential domains requiring the interpretation of latent mental content (e.g., delusions, perceptions). A 4-bit quantization analysis of the model confirmed this mechanistically: reducing model capacity disproportionately degraded inferential reasoning while preserving perceptual feature extraction. Furthermore, model-to-expert agreement degraded linearly as clinical complexity intensified across longitudinal visits (Accuracy: T0 = 84.8-87%; T1 = 80-82%; T2 = 71-73%), whereas expert consensus remained robust. Notably, model errors increased 2.3-to-3.4 fold where human experts disagreed. These findings establish inter-expert variance as an essential measurable baseline for psychiatric AI, demonstrating that true clinical translation requires models to move beyond multimodal perceptual extraction to achieve higher-order diagnostic reasoning.

2
Integrative, and Scalable mental health phenotyping using a knowledge-graph-derived dual-metric framework

Sharma, A.; Bharadwaj, A.; Modi, S.; Ahuja, G.; Jain, A.; Kumar, K.

2026-03-16 psychiatry and clinical psychology 10.64898/2026.03.09.26347798 medRxiv
Top 0.1%
28.4%
Show abstract

Prevailing diagnostic instruments for anxiety and depression, though clinically indispensable, remain anchored to symptom-focused queries that assess patients directly about their affective states, while often neglecting the multidimensional architecture of daily living. Here, we introduce two complementary metrics, the Cognitive Attention Score (CAS) and C:ERR (Cognition-to-Emotional-Response Ratio), derived from yogic psychology and operationalized within a structured knowledge graph (Ceekr-KG) comprising 151,288 triples linking 354 discrete CAS levels, 26 continuous C:ERR values, and 80 clinical symptoms. Rather than interrogating disease phenotypes directly, these metrics are computed by capturing circadian, nutritional, and lifestyle factors that jointly regulate cognitive and emotional homeostasis. Hyperparameter-tuned Ceekr-KG model demonstrated high structural fidelity (Hits@1 = 97%, mean reciprocal rank = 0.98), substantially outperforming relation-preserving randomized controls, indicating that predictive performance arises from semantic structure rather than graph topology alone. CAS and C:ERR showed a strong positive association (Spearmans {rho} = 0.787, p < 0.0001) but exhibited distinct distributional properties, with C:ERR displaying consistently stronger inverse correlations with symptom severity across domains (e.g., low energy: {rho} = -0.85 versus -0.70 for CAS). Ordinal regression further showed that a combined CAS and C:ERR model outperformed either metric alone for most symptoms, indicating complementary and non-redundant contributions to clinical variance. Integration of Ceekr-KG into the independent Clinical Knowledge Graph improved predictive performance of widely used questionnaire-based assessment scales, demonstrating that yogic psychological frameworks encode clinically relevant semantic information. Finally, longitudinal analysis of 249 individuals meeting predefined inclusion criteria (baseline CAS < 64 and >=2 assessments) across three therapeutic programmes revealed a mean CAS increase of +11.45 points (p < 0.001) and substantial migration from lower to higher functional bands, establishing Ceekr-KG as a validated digital phenotype for scalable mental health assessment.

3
Artificial Intelligence Agents in Mental Health: A Systematic Review and Meta Analysis

Zhu, L.; Wang, W.; Liang, Z.; Tan, W.; Chen, B.; Lin, X.; Wu, Z.; Yu, H.; Li, X.; Jiao, J.; He, S.; Dai, G.; Niu, J.; Zhong, Y.; Hua, W.; Chan, N. Y.; Lu, L.; Wing, Y. K.; Ma, X.; Fan, L.

2026-04-22 psychiatry and clinical psychology 10.64898/2026.04.21.26351365 medRxiv
Top 0.1%
27.4%
Show abstract

The rapid rise of large language models (LLMs) and foundation models has accelerated efforts to build artificial intelligence (AI) agents for mental health assessment, triage, psychotherapy support and clinical decision assistance. Yet a gap persists between healthcare and AI-focused work: while both communities use the language of "agents," clinical research largely describes monolithic chatbots, whereas AI studies emphasize agentic properties such as autonomous planning, multiagent coordination, tool and database use and integration with multimodal mental health data streams. In this Review, we conduct a systematic analysis of mental health AI agent systems from 2023 to 2025 using a six-dimensional audit framework: (i) system type (base model lineage, interface modality and workflow composition, from rule-based tools to role-aware multi-agent foundation-model systems), (ii) data scope (modalities and provenance, from elicited self-report and chatbot dialogues to electronic health records, biosensing and synthetic corpora), (iii) mental health focus (mapped to ICD-11 diagnostic groupings), (iv) demographics (age strata, geography and sex representation), (v) downstream tasks (screening/triage, clinical decision support, therapeutic interventions, documentation, ethical-legal support and education/simulation) and (vi) evaluation types (automated metrics, language quality benchmarks, safety stress tests, expert review and clinician or patient involvement). Across this corpus, we find that most systems (1) concentrate on depression, anxiety and suicidality, with sparse coverage of severe mental illness, neurocognitive disorders, substance use and complex comorbidity; (2) rely heavily on text-based self-report rather than clinically verified longitudinal data or genuinely multimodal inputs; (3) are implemented as single-agent chatbots powered by general-purpose LLMs rather than role-structured, workflow-integrated pipelines; and (4) are evaluated primarily via offline metrics or vignette-based scenarios, with few prospective, clinician- or patient-in-the-loop studies. At the same time, an emerging class of agentic systems assigns foundation models explicit roles as planners, retrieval agents, safety auditors or supervisors coordinating other models and tools. These multiagent, tool-augmented workflows promise personalization, safety monitoring and greater transparency, but they also introduce new risks around reliability, bias amplification, privacy, regulatory accountability and the blurring of clinical versus non-clinical roles. We conclude by outlining priorities for the next generation of mental health AI agents: clinically grounded, role-aware multi-agent architectures; transparent and privacy-preserving use of clinical and elicited data; demographic and cultural broadening beyond predominantly Western adult samples; and evaluation pipelines that progress from offline benchmarks to longitudinal, real-world studies with routine safety auditing and clear governance of responsibilities between agents and human clinicians.

4
Disentangling Symptom Heterogeneity in Large-Scale Psychiatric Text: Domain-Adapted vs. Instruction-Tuned Transformers

Varone, G.; Kumar, P.; Brown, J.; Boulila, W.

2026-02-26 psychiatry and clinical psychology 10.64898/2026.02.24.26347006 medRxiv
Top 0.1%
25.9%
Show abstract

Psychiatric disorders are fundamentally challenged by symptom heterogeneity, high comorbidity, and the absence of objective biomarkers, which together result in substantial variability in clinical assessment and treatment selection. Patient-generated language captures rich information about subjective experience and symptom severity, which can be systematically encoded and analyzed using computational models, making it a scalable signal for psychiatric assessment. We compare two approaches: (i) a domain-specialized transformer fine-tuned on clinical language, based on the Bio-ClinicalBERT encoder architecture, and (ii) a large-scale instruction-tuned generalist encoder (Instructor-XL) used as a frozen feature extractor with a shallow classification head. A corpus of N = 151,228 de-identified texts was compiled from five public sources, covering four psychiatric phenotypes: anxiety, depression, schizophrenia, and suicidal intention. Models were evaluated using stratified 10-fold cross-validation with cost-sensitive training, prioritizing imbalance-aware metrics, including Macro-F1 and Matthews Correlation Coefficient (MCC), over accuracy. Bio-ClinicalBERT achieved superior overall performance (Macro-F1 = 0.78, MCC = 0.6752), indicating more reliable separation of diagnostically overlapping affective categories. In contrast, Instructor-XL achieved its highest class-specific performance for schizophrenia (F1 = 0.798). Explainability analyses suggest that the domain-specialized model places greater weight on clinically relevant terms, whereas the generalist model relies on a broader set of lexical features.

5
Precision stratification of risk for suicidal behavior in people with bipolar depression

de Lacy, N.; Lam, W. Y.; Virtosu, M.; Deshmukh, V.; Wilson, F. A.; Pescosolido, B.; Smith, K. R.

2026-02-25 psychiatry and clinical psychology 10.64898/2026.02.23.26346921 medRxiv
Top 0.1%
25.7%
Show abstract

Patients with bipolar depression are at the highest risk for suicidal behavior, comprising [~]10% of all deaths. In the critical period preceding attempts, most are not in contact with mental health professionals to effect antisuicidal strategies. There is an urgent need for decision support tools to help nonspecialist providers identify those at elevated risk to facilitate prevention. However, we lack robust, performant predictive models to form the core of such tools. Here, we build a high-precision predictive model of 30-day risk for suicidal behavior using unique electronic health record data from >220,000 patients with bipolar depression. We show that optimized machine learning approaches offer very strong clinical utility, delivering high Standardized Net Benefit in the context of near-perfect calibration and smooth, threshold-robust decision curves. Our results break the longstanding performance ceiling in suicide risk prediction and highlight the importance of training models for clinical utility as well as discriminative skill.

6
NeuroFM: Toward Precision Neuroimaging with Foundation Models for Individualized Brain Health Estimation

Dibble, A.; Dalby, C.; Sevegnani, M.; Fracasso, A.; Lyall, D. M.; Harvey, M.; Svanera, M.

2026-03-31 neurology 10.64898/2026.03.27.26349489 medRxiv
Top 0.1%
22.5%
Show abstract

Precision neuroimaging aims to deliver individualized assessments of brain health, yet a single structural MRI does not yield a multidimensional, quantitative summary of an individual's current health or future risk. Existing approaches optimize task-specific objectives, yielding representations entangled with cohort- or disease-specific signals rather than capturing biologically grounded patterns of anatomical variation. Here, we introduce NeuroFM, a foundation model trained exclusively on 100,000 healthy synthetic volumes to predict morphometric and demographic targets. Without exposure to diagnostic labels, NeuroFM organizes brain MRIs into population-level patterns that encode meaningful brain health differences. These representations transfer across five neuroscience domains without adaptation and support simple linear readouts for clinical, cognitive, developmental, socio-behavioural, and image quality control. Evaluated on 136,361 real volumes spanning multiple cohorts, NeuroFM generalizes across domains and enables individual-level brain health profiling, estimating future dementia risk years before diagnosis. Together, these findings establish a disease-naive foundation model paradigm for precision neuroimaging.

7
Domain-adapted language model using reinforcement learning for various dementias

Kowshik, S. S.; Jasodanand, V. H.; Bellitti, M.; Puducheri, S.; Xu, L.; Liu, Y.; Saichandran, K. S.; Dwyer, B. C.; Gabelle, A.; Hao, H.; Kedar, S.; Murman, D. L.; O'Shea, S.; Saint-Hilaire, M.-H.; Samudra, N. P.; Sartor, E. A.; Swaminathan, A.; Taraschenko, O.; Yuan, J.; Au, R.; Kolachalama, V. B.

2026-03-23 neurology 10.64898/2026.03.17.26348154 medRxiv
Top 0.1%
22.1%
Show abstract

Large language models excel at processing complex clinical data and advanced reasoning, yet domain-specific adaptation is essential to realize their full potential in fields such as Alzheimers disease and related dementias (ADRD). Here, we present a generative language model for ADRD fine-tuned via reinforcement learning with verifiable rewards using a self-certainty-aware advantage. Model development and validation leveraged data from five ADRD cohorts, totaling 54, 535 participants. Our framework integrates demographics, personal and family medical histories, medication use, neuropsychological test results, functional assessments, physical and neurological examination findings, laboratory data and multimodal neuroimaging to construct comprehensive clinical profiles. On held-out testing data involving 36, 688 participants, our model achieved robust performance on syndromic classification, primary etiological diagnosis and biomarker prediction. Model predictions were validated against postmortem-confirmed diagnoses, and clinical utility was demonstrated in a controlled within-subjects crossover study where board-certified neurologists reviewed cases with and with-out model assistance, showing that exposure to model responses improved diagnostic performance. These results demonstrate that targeted domain adaptation with reinforcement learning can enable language models to deliver accurate, reasoning-driven support in ADRD evaluation. Prospective validation will be essential to translate these advances into improved patient outcomes.

8
Three Dimensions of Compounding Neglect: How Biobanks, Clinical Trials, and Scientific Literature Systematically Exclude the Global South

Corpas, M.; Freidin, M. B.; Valdivia-Silva, J.; Baker, S.; Fatumo, S.; Guio, H.

2026-02-11 public and global health 10.64898/2026.02.10.26346004 medRxiv
Top 0.1%
22.1%
Show abstract

Global health inequities are widely documented in outcomes. However, the research systems that generate knowledge, trials, and discovery have rarely been evaluated as an integrated structure. We introduce the Health Equity Informative Metrics (HEIM) framework, a three-dimensional audit of discovery (biobank output), translation (clinical trial activity), and knowledge (semantic organisation of the scientific literature). Analysing 70 international biobanks, 563,725 registered clinical trials, 13.1 million PubMed abstracts, and 175 Global Burden of Disease categories, we demonstrate that exclusion compounds systematically for diseases that primarily burden the Global South. No WHO-classified neglected tropical disease has generated a publication from these 70 biobanks. Clinical trial sites concentrate 2.5-fold in high-income countries relative to disease burden. Diseases disproportionately affecting low-and middle-income regions are 44% more semantically isolated from mainstream biomedical research than other conditions (P < 0.0001, Cohens d = 1.80), limiting cross-disciplinary integration. Nine of the ten most neglected diseases across all dimensions disproportionately affect the Global South, and these disparities show no improvement over 26 years. By contrast, the trajectory of HIV/AIDS demonstrates that sustained, coordinated investment can reverse semantic isolation and integrate a once-marginalised disease into mainstream biomedical networks. HEIM reframes research inequity as a measurable, multi-stage enterprise and establishes a framework for health data accountability.

9
A Biopsychosocial Risk Score for Stratifying Disease Vulnerability in Healthy Populations: A Prospective Cohort and Multi-Omics Study in the UK Biobank

Chen, J.; Chu, C.; Garcia-Argibay, M.; Li, W.; Christogiannis, C.; Jia, T.; Walton, C.; Xie, S.; Yuan, T.; Cortese, S.; Liu, B.; Wang, J.

2026-02-10 public and global health 10.64898/2026.02.08.26345832 medRxiv
Top 0.1%
22.1%
Show abstract

Proactive identification of systemic vulnerability for disease(s) before clinical onset in healthy individuals is an ultimate goal of preventive and precision medicine, yet current tools remain largely disease-specific and fail to quantify latent vulnerability, an integrative measure of underlying health status, for early prevention and risk-stratified intervention. To address this, we developed the Risk Score for Disease Vulnerability (RS4DV) based on 85 accessible biopsychosocial measures, which was constructed using a Light Gradient Boosting Machine trained and validated in the UK Biobank (n = 391,193). Its capacity to capture pre-clinical vulnerability was subsequently evaluated in a held-out cohort free of baseline diagnoses (n = 35,193). Over a median follow-up of 14.7 years, baseline RS4DV stratified long-term health outcomes in the held-out cohort, in which high-risk individuals exhibited accelerated disease accumulation (HR = 2.53, 95% CI: 2.44-2.62) and elevated risk of all-cause mortality (HR = 4.03, 95% CI: 3.68-4.41). Multi-omics analyses further revealed that RS4DV captures signatures of systemic inflammation, metabolic dysregulation, and accelerated brain ageing, establishing its biological interpretability. To facilitate real-world translation, we developed a practically feasible version of the RS4DV with only six routinely accessible items, which maintained robust predictive fidelity relative to the full model. This light-version model demonstrated robust generalizability in the 1970 British Birth Cohort under a zero-shot paradigm. Collectively, RS4DV provides a biologically grounded and scalable tool for personalized risk assessment decades before clinical onset and early-stage risk stratification, enabling a paradigm shift toward proactive health management and precision prevention.

10
Genomic characterization of the 2024/2025 Mpox outbreak in Uganda

Kanyerezi, S.; Ayitewala, A.; Nsawotebba, A.; Makoha, C.; Tusabe, G.; Kabahita, J. M.; Oundo, H. R.; Seruyange, J.; Tenywa, W.; Were, S.; Murungi, M.; Nakintu, V.; Sserwadda, I.; Onywera, H.; Tanui, C.; Mugerwa, I.; Kagirita, A.; Lubwama, B.; Michael, E. R.; Kateete, D. P.; Otita, M.; Giduddu, S.; Jjingo, D.; Mboowa, G.; Ssemaganda, A.; Nabadda, S.; Tessema, S. K.; Ssewanyana, I.

2026-03-17 public and global health 10.64898/2026.03.16.26348494 medRxiv
Top 0.1%
21.8%
Show abstract

Mpox has historically been endemic in Central and West Africa, driven by recurrent zoonotic spillover events, but recent outbreaks in East Africa underscore its expanding geographic footprint. Despite this shift, genomic data from East Africa remain limited. We performed genomic characterization of the 2024/2025 Mpox outbreak in Uganda using PCR-confirmed monkeypox virus (MPXV) positive samples (n=511) from 44 districts, all achieving [&ge;]70% genome coverage. To provide regional context, we incorporated 895 publicly available clade Ib MPXV genomes from GISAID, Pathoplexus, and NCBI. Phylogenetic analysis revealed two major clusters within clade Ib, each subdivided into two subclusters, indicating substantial viral diversification. Most Ugandan sequences clustered within the most genetically diverse subcluster. Additional Ugandan genomes were distributed across other subclusters, indicating co-circulation of multiple lineages. Cluster 1 was dominated by sequences from the Democratic Republic of Congo, while phylogeographic analysis identified multiple cross-border introductions into Uganda. These findings highlight the role of regional connectivity in shaping MPXV transmission and underscore the importance of integrated genomic surveillance and cross-border data sharing to inform outbreak response in East and Central Africa.

11
Development and retrospective validation of SCOUT: scalable clinical oversight of large language models via uncertainty triangulation

Ba, Z.; He, M.; He, H.; Fu, Q.; Lai, J.; Zhang, R.; Diao, X.; Liu, M.; Wang, Z.; Wang, X.; Zhao, S.; Zhu, Y.; Chen, H.; Qiu, Y.; Su, Q.; Xu, J.; Hu, F.; Luo, X.; Chen, H.; Zheng, M.; Xu, B.; Liu, J.; Guo, N.; Gao, X.; Wang, G.; Wu, Y.

2026-02-10 cardiovascular medicine 10.64898/2026.02.08.26345860 medRxiv
Top 0.1%
20.0%
Show abstract

Large language models (LLMs) are increasingly used in clinical workflows, yet requiring clinician review of every AI output negates the efficiency gains that motivate their adoption. We present SCOUT (Scalable Clinical Oversight via Uncertainty Triangulation), a model-agnostic meta-verification framework that selectively defers unreliable LLM predictions to clinicians by triangulating three orthogonal signals: model heterogeneity, stochastic inconsistency, and reasoning critique. In this retrospective development and validation study, we derived the framework on a discovery cohort (n = 405) and validated it across three clinically distinct tasks using 4 independent retrospective cohorts: coronary heart disease subtyping (n = 2,271), liver cancer screening from radiology reports (n = 3,373), and diseased coronary vessel counting (n = 286). SCOUT reduced the volume of cases requiring human review by 45% to 83%, with projected final accuracy of 99.1% to 100.0% assuming expert correction of all flagged cases. SCOUT provides a scalable, retrospectively validated approach for deploying generative AI in clinical medicine without compromising patient safety. Prospective randomized validation is underway to confirm real-world clinical utility.

12
Stabilized gp120-specific CD4 for next-generation HIV-1 inhibitors

Bahn-Suh, A. J.; Caldera, L. F.; Gnanapragasam, P. N. P.; Keeffe, J. R.; Seaman, M. S.; Bjorkman, P. J.; Mayo, S. L.

2026-03-27 bioengineering 10.64898/2026.03.24.713825 medRxiv
Top 0.1%
19.3%
Show abstract

HIV-1 Envs gp120 subunit uses the T-cell coreceptor CD4 to enter host cells in a manner that prevents the evolution of host resistance by sharing the binding epitope with the footprint of CD4s natural ligands, class II MHC proteins1,2. Consequently, CD4-containing biologics, such as CD4-Ig3,4 and derivatives5-9, benefit from this conserved relationship and are promising broad-acting anti-HIV-1 agents that are resistant to viral mutational escape10. However, these biologics suffer from short serum half-lives in humans11,12 and animals3,13, likely due to CD4s poor thermostability14 and/or off-target class II MHC binding15. This latter property also warrants caution for CD4-containing biologics that could indiscriminately recruit Fc-dependent effector functions against uninfected cells and/or compete with host CD4 for class II MHC during T cell interactions with antigen-presenting cells. Here, we describe gp120-specific CD4 (gCD4), which exhibits enhanced thermostability and retains Env, but not class II MHC, binding. CD4-Ig variants incorporating gCD4 did not bind class II MHC on human B cells, displayed greater longevity in human tonsil organoid cultures, showed half-lives equivalent to therapeutic IgG antibodies in mice, and neutralized HIV-1 more broadly and potently compared to the original CD4-Ig molecules. Encouragingly, one variant neutralized 100% of a panel of clinically-relevant HIV-1 strains at titers correlating to infection prevention in humans, outperforming known broadly neutralizing antibodies16,17. Thus, gCD4 holds promise for the development of new CD4-containing biologics with best-in-class specificity, pharmacokinetic properties, and neutralization breadth and potency.

13
Evaluating Large Language Models for Assessment of Psychosis Risk

Zhu, T.; Tashevski, A.; Taquet, M.; Azis, M.; Jani, T.; Broome, M. R.; Kabir, T.; Minichino, A.; Murray, G. K.; Nour, M. M.; Singh, I.; Fusar-Poli, P.; Nevado-Holgado, A.; McGuire, P.; Oliver, D.

2026-04-04 psychiatry and clinical psychology 10.64898/2026.04.02.26349960 medRxiv
Top 0.1%
19.2%
Show abstract

Psychosis prevention relies on early detection of individuals at clinical high risk for psychosis (CHR-P) remains limited, constraining preventive care. The effectiveness of the CHR-P state is constrained, in part due to clinical assessments requiring specialist interpretation of narrative interviews, limiting scalability. Here, we evaluate whether large language models (LLMs; deep learning models trained on large text corpora to process and generate language) can extract clinically meaningful information from such interviews to support psychosis risk assessment. We assessed 11 open-weight LLMs on 678 PSYCHS interview transcripts from 373 participants (77.7% CHR-P). Models inferred CHR-P status and estimated severity and frequency across 15 symptom domains, benchmarked against researcher-rated scores. Larger models achieved the strongest classification performance (Llama-3.3-70B: accuracy = 0.80, sensitivity = 0.93, specificity = 0.58). LLM-generated symptom scores showed good correlations with researcher-rated scores (ICCsev = 0.74, ICCfreq = 0.75). Performance disparities were minimal across most demographic groups but varied across sites. Generated summaries were largely faithful to source transcripts, with low rates of clinically relevant confabulation (3%). Errors primarily reflected over-pathologisation of non-clinical experiences. While accuracy scaled with model size, smaller models achieved competitive performance with substantially lower computational cost. These findings demonstrate that open-weight LLMs can assess psychosis risk from clinical interview transcripts, supporting scalable, human-in-the-loop approaches to early detection.

14
CellSwarm: LLM-Driven Cell Agents Recapitulate Tumor Microenvironment Dynamics and Sense Indirect Genetic Perturbations

Meng, X.; Wang, T.; Dong, Z.; Li, X.; Cui, X.; Wang, L.

2026-02-26 systems biology 10.64898/2026.02.25.707926 medRxiv
Top 0.1%
18.8%
Show abstract

Agent-based models of the tumor microenvironment (TME) traditionally rely on hand-coded rules that cannot generalize beyond their programmed logic. Here we present CELLSWARM, a framework that replaces rule-based cell decision-making with large language model (LLM)-driven autonomous agents. Each simulated cell maintains persistent state, 14 signal pathways, and a memory stream, with an LLM serving as its cognitive core. Using structured knowledge bases for cancer-specific context, CELLSWARM recapitulates TNBC microenvironment composition with fidelity comparable to hand-coded rules (Jensen-Shannon divergence 0.144 vs. 0.146; P=0.012 vs. random, Mann-Whitney U test). Beyond matching rule-based performance, LLM-driven agents demonstrate three capabilities absent from rule-based models: cross-cancer generalization by swapping knowledge base entries, treatment response prediction concordant with clinical data (anti-PD-1: 17.6% simulated vs. 21% clinical), and sensing of indirect genetic perturbations that propagate through intermediate signaling cascades (IFN-{gamma} KO: Agent +15.7% vs. Rules +0.3%; P=0.005). CELLSWARM demonstrates that LLM-driven cell agents can recapitulate and extend TME simulation beyond the reach of hand-coded rules.

15
Peer support boosted Hepatitis C treatment access among marginalised populations in England: A Bayesian causal factor analysis.

Schmidt, C.; Samartsidis, P.; Seaman, S.; Emmanouil, B.; Foster, G.; Reid, L.; Smith, S.; De Angelis, D.

2026-04-22 health policy 10.64898/2026.04.20.26351261 medRxiv
Top 0.1%
18.6%
Show abstract

To minimise health disparities, equitable access to medical treatment is paramount. In a pioneering intervention, National Health Service Englands Hepatitis C virus (HCV) programme has implemented country-wide peer support to boost treatment access. Peer support workers (peers) are individuals with relevant lived experience, who promote testing and treatment in marginalised populations underserved by traditional health services. We evaluated the English peers intervention, exploiting its staggered rollout and rich surveillance data between June 2016 and May 2021. Peers increased HCV cases identified by 13{middle dot}9% (95% credible interval (95% CrI) [5{middle dot}3, 21{middle dot}7]), sustained viral responses by 8{middle dot}0% (95% CrI [-4{middle dot}4, 18{middle dot}6]), and drug services referrals by 8{middle dot}8% (95% CrI [-12{middle dot}5, 22{middle dot}6]). The interventions effectiveness was magnified during the first COVID-19 lockdown and individuals supported by peers typically belonged to populations with poor treatment access. Our findings indicate that peers can boost equity in treatment access on a national scale.

16
The Evolutionary Dynamics and Regional Spread of Mpox in Africa: Insights from Multi-country Genomic Surveillance

Tanui, C. K.; Kinganda-Lusamaki, E.; O'Toole, A.; Chitenje, M.; Campbell, A. K. O.; DIAGNE, M. M.; Kanyerezi, S.; Faye, M.; Ifabumuyi, S. O.; Nzoyikorera, N.; Lango, H. O.; Koukouikila-Koussounda, F.; Meite, S.; Sikazwe, E.; Djuicy, D. D.; Adu, B.; MAMAN, I.; Mapunda, L. A.; Nyan, D. C.; Stephane, S.; Aricha, S. A.; Cherif Gnimadi, T. A.; Maror, J. A.; Pereira, A. M.; Atrah, Y. S.; Akanbi, O. A.; Lokilo, E. L.; Makangara-Cigolo, J.-C.; Paku, P. T.; Luakanda, G. N.; Amuri-Aziza, A.; Wawina-Bokalanga, T.; Mugerwa, I.; Nsawotebba, A.; Ayitewala, A.; Williams, A. J.; Folorunso, V.; Mani, S.; Hardi

2026-04-11 infectious diseases 10.64898/2026.04.07.26347884 medRxiv
Top 0.1%
18.4%
Show abstract

The recent MPXV epidemic across Africa revealed extensive viral diversity and complex transmission dynamics, prompting a continent-wide genomic investigation. We analysed 3,450 high-quality MPXV virus whole genomes from 24 African Union Member States, revealing the complex and concurrent circulation of Subclades Ia, Ib, IIa, and IIb. Subclade Ia showed high levels of virus diversity in reservoir hosts in Central Africa, detected through zoonotic transmission and some sustained human outbreak lastly detected. In contrast, Clade Ib exhibited signatures of sustained human to human transmission across Eastern and Southern Africa. Clade IIa remains largely zoonotic in West Africa. Like Ia, IIb shows continued zoonotic transmission, and sustained human outbreak linked to lineage G1 and G2 circulation. Phylogeographic analyses revealed frequent cross border transmission and interconnectedness, which was aligned with both human mobility corridors and international boundaries. For instance, the Democratic Republic of the Congo or Sierra Leone seems to emerge as a source of regional exportation, while the Cameroon and Nigeria, CAR and Cameroon or CAR and DRC interfaces reflected ongoing cross border zoonotic spillovers. These findings underscore the need for harmonised genomic surveillance, APOBEC3-aware triage, and integrated One Health strategies to prevent local outbreaks from escalating into regional epidemics and to inform vaccine deployment and public health preparedness.

17
A Cerebral Frailty Risk Score Integrating Frailty Index and Neuroimaging for Dementia Prediction in the UK Biobank

Kan, C. N.; Chew, J.; Lim, W. S.; Tan, C. H.

2026-04-04 geriatric medicine 10.64898/2026.04.01.26350015 medRxiv
Top 0.1%
18.3%
Show abstract

Frailty is a multisystem clinical syndrome closely linked to cognitive aging, yet its cerebral underpinnings and co-contribution to adverse outcomes remain poorly understood. In 63,509 dementia-free UK Biobank participants (aged 65.0{+/-}7.7), higher frailty index (FI) was associated with multiple neuroimaging markers, including reduced hippocampal volume, decreased cortical thickness, greater white matter hyperintensities burden, and impaired brain diffusion metrics. FI and neuroimaging markers additively increased the risks of incident dementia and mortality. An extreme gradient boosting with accelerated failure time framework highlighted FI and key regional neuroimaging features in dementia risk prediction (nested C-index=0.825, iAUC=0.759). Integrating the top 10 predictors into a novel point-based cerebral frailty risk score (CFRS) showed strong performance in predicting dementia onset (optimism-corrected C-index=0.838, iAUC=0.778), and was robust to the competing risk of mortality. These findings highlight the potential utility of a CFRS framework that integrates cumulative systemic and cerebral vulnerabilities for dementia risk stratification.

18
Structured retrieval closes the gap between low-cost and frontier clinical language models

Gorenshtein, A.; Sorka, M.; Omar, M.; Miron, K.; Hatav, A.; Barash, Y.; Klang, E.; Shelly, S.

2026-03-24 neurology 10.64898/2026.03.22.26349018 medRxiv
Top 0.1%
18.0%
Show abstract

Most clinical large language model (LLM) benchmarks rely on clean, concise vignettes that do not reflect the noisy, long-form documentation typical of real clinical records. How LLM performance degrades under realistic chart conditions remains poorly characterised. Here we test whether structured retrieval workflows protect National Institutes of Health Stroke Scale (NIHSS) scoring accuracy under systematic context stress. Using 100 de-identified acute stroke cases and a fully crossed 4 x 4 x 3 x 3 condition matrix (144 conditions per case), we vary context acquisition method, document length, distractor load and critical-information position across four Gemini models (57,047 retained runs). Structured retrieval reduces mean absolute error (MAE) from 4.58 to 2.96 points relative to non-agentic baselines (mean gain 1.62 MAE points; 95% CI 1.57 to 1.67; 35% relative reduction), with consistent gains across all 36 stress combinations. Lower-cost models show disproportionately larger gains (2.76 versus 0.45 MAE points). Tool-retrieved pipelines outperform retrieval-augmented generation in 33 of 36 combinations. These findings indicate that retrieval architecture, rather than model scale alone, is a tractable lever for robust, equitable clinical LLM deployment.

19
Greater lean-body-mass decline with tirzepatide than semaglutide in routine care, revealed by body-composition digital phenotyping

Murugadoss, K.; Venkatakrishnan, A.; Soundararajan, V.

2026-04-13 endocrinology 10.64898/2026.04.11.26350687 medRxiv
Top 0.1%
17.9%
Show abstract

GLP-1 receptor agonists induce substantial weight loss, but the extent to which lean tissue and physical function are preserved in routine care remains poorly understood. Using an EHR-linked body-composition digital phenotyping pipeline with LLM-based extraction, we performed a large-scale longitudinal analysis of 670,422 first-episode GLP-1RA users, including 456,742 treated with semaglutide and 213,680 treated with tirzepatide. Among these, 7,965 individuals with paired pre- and post-initiation body-composition measurements were analyzed over 12 months. Tirzepatide was associated with greater relative lean body mass (LBM) loss than semaglutide at each measured time point, with excess LBM losses of 1.1%, 1.5%, 1.3% and 2% at 3, 6, 9 and 12 months, respectively. A Depletive GLP-1 metabotype, defined as >20% total body weight (TBW) loss with >5% LBM loss, was significantly more frequent with tirzepatide than semaglutide during the first year of therapy (10.3% versus 6.7%, p<0.001). By contrast, a Prime GLP-1 metabotype, defined as >10% TBW loss with <5% LBM loss, was numerically more frequent with semaglutide than tirzepatide, but not significantly so (12.3% versus 11.8%, p=0.66). Higher drug dose and longer exposure were associated with progressively greater LBM decline in both treatment groups (both p<0.001). Among 3,746 examined EHR phenotypes, baseline musculoskeletal pain emerged as the most significant correlate of greater LBM loss (BH-adjusted q<0.001): cervicalgia (semaglutide, -4.1 percentage points; tirzepatide, -14.3 percentage points) and knee pain (semaglutide, -4.8 percentage points; tirzepatide, -13.4 percentage points), consistent with mobility-limited patients being more vulnerable to lean-tissue depletion during incretin therapy. Analysis of EHR notes for on-treatment functional features showed reduced exercise tolerance was the strongest correlate of greater LBM loss, increasing by 7.2 and 11.1 percentage points in semaglutide- and tirzepatide-treated patients, respectively. An independent analysis of all available Single-cell RNA-seq data from human musculature showed broader GIPR+ cellular distribution than GLP1R+ cells across immune, stromal, vascular, and contractile compartments, providing plausible biological context for the greater LBM loss observed in routine care with tirzepatide (dual GLP1R-GIPR agonist) relative to semaglutide (GLP1R-specific agonist). In this observational study, greater weight-loss efficacy did not necessarily translate into more favorable body-composition outcomes, underscoring the need for clinical decision-making and trial designs that maximize each patient's likelihood of achieving a Prime GLP-1 metabotype.

20
GLP-1R expression protects against 58 diseases but raises risk for 34 diseases and neonatal health

Campbell, R. H.; Mills, M. C.

2026-01-26 public and global health 10.64898/2026.01.24.26344404 medRxiv
Top 0.1%
17.2%
Show abstract

Cardiometabolic gene-target drugs such as Glucagon-Like Peptide 1 Receptor agonists are used extensively, yet risks and repurposing have not been systematically evaluated across a comprehensive set of diseases. We use Phenome-wide Mendelian Randomization to test GLP1R gene expression and Cholesteryl Ester Transfer Protein (CETP) concentration, two promising cardiometabolic drug targets, on risk of 396 diseases based on sex-specific, ancestry-specific, genome-wide association studies in UK Biobank. We identify 92 causal effects (66 novel) of genetically proxied GLP1R expression on disease, with 58 protective and 34 risk effects. Risks of GLP1R expression on neonatal health raise concern for women who may become pregnant. GLP1R expression substantially increases risk of Vitamin D deficiency. Although GLP1R is hypothesized as anti-ageing, we find it increases risk of 22 age-related diseases. Conversely, we find CETP inhibition is narrowly cardioprotective. Results show benefits, risks and repurposing opportunities for GLP1R-targetted and CETP-targeted drugs.